On Robust Arm-Acquiring Bandit Problems
نویسندگان
چکیده
In the classical multi-armed bandit problem, at each stage, the player has to choose one from N given projects (arms) to generate a reward depending on the arm played and its current state. The state process of each arm is modeled by a Markov chain and the transition probability is priorly known. The goal of the player is to maximize the expected total reward. One variant of the problem, the so-called arm-acquiring bandit, studies the case where at each stage new projects may arrive. Another recent extension of the classical bandit problem incorporates the uncertainty of the transition probabilities. This robust control problem considers an adversary, “nature”, who aims to minimize the player’s expected total reward by choosing a different transition probability measure each time after the player makes a decision. In this paper, we consider the robust arm-acquiring bandit problem, the combination of the two extensions above, and show that there exists an optimal state-by-state retirement policy. The extension to the robust arm-acquiring tax problem under some condition is also introduced.
منابع مشابه
A Parallel Program for 3 - Arm
We describe a new parallel program for optimizing and analyzing 3-arm Bernoulli bandit problems. Previous researchers had considered this problem computationally intractable, and we know of no previous exact optimizations of 3-arm bandit problems. Despite this, our program is able to solve problems of size 100 or more. We describe the techniques used to achieve this, and indicate various extens...
متن کاملA Parallel Program for 3-arm Bandits
We describe a new parallel program for optimizing and analyzing 3-arm Bernoulli bandit problems. Previous researchers had considered this problem computationally intractable, and we know of no previous exact optimizations of 3-arm bandit problems. Despite this, our program is able to solve problems of size 100 or more. We describe the techniques used to achieve this, and indicate various extens...
متن کاملThompson sampling with the online bootstrap
Thompson sampling provides a solution to bandit problems in which new observations are allocated to arms with the posterior probability that an arm is optimal. While sometimes easy to implement and asymptotically optimal, Thompson sampling can be computationally demanding in large scale bandit problems, and its performance is dependent on the model fit to the observed data. We introduce bootstr...
متن کاملA stochastic bandit algorithm for scratch games
Stochastic multi-armed bandit algorithms are used to solve the exploration and exploitation dilemma in sequential optimization problems. The algorithms based on upper confidence bounds offer strong theoretical guarantees, they are easy to implement and efficient in practice. We considers a new bandit setting, called “scratch-games”, where arm budgets are limited and reward are drawn without rep...
متن کاملOn the Optimal Amount of Experimentation in Sequential Decision Problems
We provide a tight bound on the amount of experimentation under the optimal strategy in sequential decision problems. We show the applicability of the result by providing a bound on the cut-off in a one-arm bandit problem.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014